Skip to content

[Feat] Enable VAE parallel in HunyuanImage3#3091

Merged
Gaohan123 merged 8 commits into
vllm-project:mainfrom
Fishermanykx:yukexiong/hunyuan_vae_opt
May 20, 2026
Merged

[Feat] Enable VAE parallel in HunyuanImage3#3091
Gaohan123 merged 8 commits into
vllm-project:mainfrom
Fishermanykx:yukexiong/hunyuan_vae_opt

Conversation

@Fishermanykx
Copy link
Copy Markdown
Contributor

@Fishermanykx Fishermanykx commented Apr 24, 2026

Summary

Enable VAE parallel support in HunyuanImage3.

Current changes:

  • add a distributed Hunyuan VAE wrapper at vllm_omni/diffusion/distributed/autoencoders/autoencoder_kl_hunyuan.py
  • wire HunyuanImage3Pipeline to use the distributed autoencoder wrapper
  • remove the NPU fused MoE init hook in vllm_omni/platforms/npu/models/hunyuan_fused_moe.py

unified deploy yaml in #3172

Validation

  • static checks only so far (py_compile, diff checks)
  • runtime validation is still pending

Test Plan

Tested on 4xAscend NPU

server

vllm serve $model --omni --port "8031" \
    --log-stats \
    --stage-configs-path "vllm_omni/platforms/npu/stage_configs/hunyuan_image3_t2i.yaml" 

vae_patch_parallel_size is set to 4

client

curl -X POST http://localhost:8031/v1/images/generations \
  -H "Content-Type: application/json" \
  -d '{
    "prompt": 
    "A cinematic medium shot captures a single Asian woman seated on a chair within a dimly lit room, creating an intimate and theatrical atmosphere. The composition is focused on the subject, rendered with rich colors and intricate textures that evoke a nostalgic and moody feeling.\n\nThe primary subject is a young Asian woman with a thoughtful and expressive countenance, her gaze directed slightly away from the camera. She is seated in a relaxed yet elegant posture on an ornate, vintage armchair. The chair is upholstered in a deep red velvet, its fabric showing detailed, intricate textures and slight signs of wear. She wears a simple, elegant dress in a dark teal hue, the material catching the light in a way that reveals its fine-woven texture. Her skin has a soft, matte quality, and the light delicately models the contours of her face and arms.\n\nThe surrounding room is characterized by its vintage decor, which contributes to the historic and evocative mood. In the immediate background, partially blurred due to a shallow depth of field consistent with a f/2.8 aperture, the wall is covered with wallpaper featuring a subtle, damask pattern. The overall color palette is a carefully balanced interplay of deep teal and rich red hues, creating a visually compelling and cohesive environment. The entire scene is detailed, from the fibers of the upholstery to the subtle patterns on the wall.\n\nThe lighting is highly dramatic and artistic, defined by high contrast and pronounced shadow play. A single key light source, positioned off-camera, projects gobo lighting patterns onto the scene, casting intricate shapes of light and shadow across the woman and the back wall. These dramatic shadows create a strong scense of depth and a theatrical quality. While some shadows are deep and defined, others remain soft, gently wrapping around the subject and preventing the loss of detail in darker areas. The soft focus on the background enhances the intimate feeling, drawing all attention to the expressive subject. The overall image presents a cinematic, photorealistic photography style.",
    "num_inference_steps": 2,
    "guidance_scale": "1.0",
    "n": 1,
    "size": "1024x1024",
    "seed": 42
  }' | jq -r '.data[0].b64_json' | base64 -d > output.png

Test Result

output

output

VAE decode time 625.7ms -> 355ms

w/o vae parallel
wo-vae

w vae parallel
w vae

@Fishermanykx Fishermanykx force-pushed the yukexiong/hunyuan_vae_opt branch 2 times, most recently from 421d557 to c69899e Compare April 24, 2026 03:36
@Fishermanykx Fishermanykx changed the title [WIP][Feat.] Enable VAE parallel in HunyuanImage3 [Feat.] Enable VAE parallel in HunyuanImage3 Apr 24, 2026
@Fishermanykx Fishermanykx marked this pull request as ready for review April 24, 2026 03:36
@Fishermanykx
Copy link
Copy Markdown
Contributor Author

Fishermanykx commented Apr 24, 2026

PTAL @gcanlin @Semmer2

@Fishermanykx Fishermanykx changed the title [Feat.] Enable VAE parallel in HunyuanImage3 [Feat] Enable VAE parallel in HunyuanImage3 Apr 24, 2026
@Fishermanykx Fishermanykx force-pushed the yukexiong/hunyuan_vae_opt branch 2 times, most recently from ee9b0b3 to a4502c4 Compare April 24, 2026 07:23
@hsliuustc0106
Copy link
Copy Markdown
Collaborator

does it work in GPU as well?
does it affect the acc?

Copy link
Copy Markdown
Contributor

@Bounty-hunter Bounty-hunter left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@Fishermanykx Fishermanykx force-pushed the yukexiong/hunyuan_vae_opt branch from 378289a to 2eacaf2 Compare May 11, 2026 12:03
@Fishermanykx Fishermanykx force-pushed the yukexiong/hunyuan_vae_opt branch 5 times, most recently from ae99ecc to c6f0e06 Compare May 14, 2026 02:42
@BLANKETusers
Copy link
Copy Markdown
Contributor

Test Plan

Tested on 2xH200 GPU

VAE

python vllm-omni/examples/offline_inference/hunyuan_image3/end2end.py \
  --model tencent/HunyuanImage-3.0-Instruct \
  --modality text2img \
  --deploy-config vllm-omni/vllm_omni/deploy/hunyuan_image3_dit.yaml \
  --prompts "A cinematic medium shot captures a single Asian woman seated on a chair within a dimly lit room, creating an intimate and theatrical atmosphere. The composition is focused on the subject, rendered with rich colors and intricate textures that evoke a nostalgic and moody feeling.\n\nThe primary subject is a young Asian woman with a thoughtful and expressive countenance, her gaze directed slightly away from the camera. She is seated in a relaxed yet elegant posture on an ornate, vintage armchair. The chair is upholstered in a deep red velvet, its fabric showing detailed, intricate textures and slight signs of wear. She wears a simple, elegant dress in a dark teal hue, the material catching the light in a way that reveals its fine-woven texture. Her skin has a soft, matte quality, and the light delicately models the contours of her face and arms.\n\nThe surrounding room is characterized by its vintage decor, which contributes to the historic and evocative mood. In the immediate background, partially blurred due to a shallow depth of field consistent with a f/2.8 aperture, the wall is covered with wallpaper featuring a subtle, damask pattern. The overall color palette is a carefully balanced interplay of deep teal and rich red hues, creating a visually compelling and cohesive environment. The entire scene is detailed, from the fibers of the upholstery to the subtle patterns on the wall.\n\nThe lighting is highly dramatic and artistic, defined by high contrast and pronounced shadow play. A single key light source, positioned off-camera, projects gobo lighting patterns onto the scene, casting intricate shapes of light and shadow across the woman and the back wall. These dramatic shadows create a strong scense of depth and a theatrical quality. While some shadows are deep and defined, others remain soft, gently wrapping around the subject and preventing the loss of detail in darker areas. The soft focus on the background enhances the intimate feeling, drawing all attention to the expressive subject. The overall image presents a cinematic, photorealistic photography style." \
  --output ./output/output_offline_vae \
  --vae-use-tiling

No VAE

python vllm-omni/examples/offline_inference/hunyuan_image3/end2end.py \
  --model tencent/HunyuanImage-3.0-Instruct \
  --modality text2img \
  --deploy-config vllm-omni/vllm_omni/deploy/hunyuan_image3_dit.yaml \
  --prompts "A cinematic medium shot captures a single Asian woman seated on a chair within a dimly lit room, creating an intimate and theatrical atmosphere. The composition is focused on the subject, rendered with rich colors and intricate textures that evoke a nostalgic and moody feeling.\n\nThe primary subject is a young Asian woman with a thoughtful and expressive countenance, her gaze directed slightly away from the camera. She is seated in a relaxed yet elegant posture on an ornate, vintage armchair. The chair is upholstered in a deep red velvet, its fabric showing detailed, intricate textures and slight signs of wear. She wears a simple, elegant dress in a dark teal hue, the material catching the light in a way that reveals its fine-woven texture. Her skin has a soft, matte quality, and the light delicately models the contours of her face and arms.\n\nThe surrounding room is characterized by its vintage decor, which contributes to the historic and evocative mood. In the immediate background, partially blurred due to a shallow depth of field consistent with a f/2.8 aperture, the wall is covered with wallpaper featuring a subtle, damask pattern. The overall color palette is a carefully balanced interplay of deep teal and rich red hues, creating a visually compelling and cohesive environment. The entire scene is detailed, from the fibers of the upholstery to the subtle patterns on the wall.\n\nThe lighting is highly dramatic and artistic, defined by high contrast and pronounced shadow play. A single key light source, positioned off-camera, projects gobo lighting patterns onto the scene, casting intricate shapes of light and shadow across the woman and the back wall. These dramatic shadows create a strong scense of depth and a theatrical quality. While some shadows are deep and defined, others remain soft, gently wrapping around the subject and preventing the loss of detail in darker areas. The soft focus on the background enhances the intimate feeling, drawing all attention to the expressive subject. The overall image presents a cinematic, photorealistic photography style." \
  --output ./output/output_offline_vae

Test Result

VAE

output_0_0

No VAE

output_0_0

CLIP Score

99.85/100

@Fishermanykx Fishermanykx force-pushed the yukexiong/hunyuan_vae_opt branch from c6f0e06 to 8c4b866 Compare May 14, 2026 09:13
@Gaohan123 Gaohan123 added this to the v0.22.0 milestone May 14, 2026
Copy link
Copy Markdown
Collaborator

@Gaohan123 Gaohan123 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here are some suggestions:

  1. Please add simple UT for it
  2. I didn't notice any modification about NPU, which is not consistent with your PR description

@Fishermanykx
Copy link
Copy Markdown
Contributor Author

  • remove the NPU fused MoE init hook in vllm_omni/platforms/npu/models/hunyuan_fused_moe.py
  1. done
  2. remove the NPU fused MoE init hook in vllm_omni/platforms/npu/models/hunyuan_fused_moe.py this is done in pull 2979, which is not merged when this pr proposed. As I rebase my code, this change no longer exists in this pr.

@Fishermanykx Fishermanykx requested a review from yenuo26 as a code owner May 15, 2026 07:02
@Fishermanykx Fishermanykx force-pushed the yukexiong/hunyuan_vae_opt branch from f338f4d to 014b54b Compare May 15, 2026 07:05
Signed-off-by: KexiongYu <yukexiong1@huawei.com>
Signed-off-by: KexiongYu <yukexiong1@huawei.com>
@Fishermanykx Fishermanykx force-pushed the yukexiong/hunyuan_vae_opt branch from 014b54b to 272bc98 Compare May 15, 2026 08:46
@Gaohan123 Gaohan123 added the ready label to trigger buildkite CI label May 18, 2026
Copy link
Copy Markdown
Collaborator

@Gaohan123 Gaohan123 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks

@Gaohan123 Gaohan123 enabled auto-merge (squash) May 18, 2026 09:26
@Gaohan123 Gaohan123 disabled auto-merge May 18, 2026 09:49
…oencoder_kl_hunyuan.py to clarify what each case validates, without changing test behavior

Signed-off-by: zzh <943967662@qq.com>
@BLANKETusers BLANKETusers force-pushed the yukexiong/hunyuan_vae_opt branch from 4f1aa1b to 61fc92a Compare May 19, 2026 03:54
logger = init_logger(__name__)


class DistributedAutoencoderKLHunyuan(AutoencoderKLConv3D, DistributedVaeMixin):
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

missing from_pretrained, consistency suggests adding it

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Modified

def encode_tile_exec(self, task: TileTask) -> torch.Tensor:
return self.encoder(task.tensor)

def encode_tile_merge(
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tile_merge and encode_tile_merge are byte-for-byte identical. Could extract helper

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Modified

torch.nn.Module.__init__(self)
self.tile_latent_min_size = 2
self.tile_sample_min_size = 2
self.tile_overlap_factor = 0.0
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tile_overlap_factor hardcoded to 0.0 (real default 0.25), blend logic never tested.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Modified

autoencoder_kl_hunyuan imports AutoencoderKLConv3D from the
hunyuan_image3 package, which triggers hunyuan_image3/__init__.py
to execute and import pipeline_hunyuan_image3, which in turn
imported DistributedAutoencoderKLHunyuan back from
autoencoder_kl_hunyuan before it finished initializing, causing
a circular import error during test collection.

Fix by moving the top-level import of DistributedAutoencoderKLHunyuan
into HunyuanImage3Pipeline.__init__ as a lazy import, so it is only
resolved at call time when both modules are fully initialized.

Signed-off-by: zzh <943967662@qq.com>
autoencoder_kl_hunyuan imports AutoencoderKLConv3D from the
hunyuan_image3 package, which triggers hunyuan_image3/__init__.py
to execute and import pipeline_hunyuan_image3, which in turn
imported DistributedAutoencoderKLHunyuan back from
autoencoder_kl_hunyuan before it finished initializing, causing
a circular import error during test collection.

Fix by moving the top-level import of DistributedAutoencoderKLHunyuan
into HunyuanImage3Pipeline.__init__ as a lazy import, so it is only
resolved at call time when both modules are fully initialized.

Signed-off-by: zzh <943967662@qq.com>
…an and deduplicate tile merge

- Add missing from_pretrained classmethod for consistency with other
  distributed autoencoders (KL, Wan, QwenImage)
- Delegate encode_tile_merge to tile_merge to eliminate byte-for-byte
  duplicate code

Signed-off-by: zzh <943967662@qq.com>
…AE tests

Adjust grid_shape from (2,2) to (4,4) and tile count from 4 to 16.
When tile_overlap_factor=0.25, overlap_size becomes 1 instead of 2,
producing a denser 4x4 tile grid on the 4x4 input.

Signed-off-by: zzh <943967662@qq.com>
…AE tests

With tile_latent_min_size=2 and tile_overlap_factor=0.25, blend_extent
truncates to int(0.5)=0, causing overlapping tiles with no blending
and producing misaligned 7x7 output instead of the expected 4x4.
Increasing min_size to 8 makes blend_extent=2 and keeps the tile
pipeline's math self-consistent while preserving tile_overlap_factor=0.25.
Signed-off-by: zzh <943967662@qq.com>
@Gaohan123 Gaohan123 merged commit 2917959 into vllm-project:main May 20, 2026
7 of 9 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ready label to trigger buildkite CI

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants